Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Improved automatic classification algorithm of software bug report in cloud environment
HUANG Wei, LIN Jie, JIANG Yu'e
Journal of Computer Applications    2016, 36 (5): 1212-1215.   DOI: 10.11772/j.issn.1001-9081.2016.05.1212
Abstract517)      PDF (705KB)(399)       Save
User-submitted bug reports are arbitrary and subjective. The accuracy of automatic classification of bug reports is not ideal. Hence it requires many human labors to intervention. With the bug reports database growing bigger and bigger, the problem of improving the accuracy of automatic classification of these reports is becoming urgent. A TF-IDF (Term Frequency-Inverse Document Freqency) based Naive Bayes (NB) algorithm was proposed. It not only considered the relationship of a term in different classes but also the relationship of a term inside a class. It was also implemented in distributed parallel environment of MapReduce model in Hadoop platform. The experimental results show that the proposed Naive Bayes algorithm improves the performance of F1 measument to 71%, which is 27 percentage points higher than the state-of-the-art method. And it is able to deal with massive amounts of data in distributed way by addding computational node to offer shorter running time and has better effective performance.
Reference | Related Articles | Metrics
Local similarity detection algorithm for time series based on distributed architecture
LIN Yang, JIANG Yu'e, LIN Jie
Journal of Computer Applications    2016, 36 (12): 3285-3291.   DOI: 10.11772/j.issn.1001-9081.2016.12.3285
Abstract631)      PDF (1125KB)(481)       Save
The CrossMatch algorithm based on the idea of Dynamic Time Warping (DTW) can be used to solve the problems of local similarity between time series. However, due to the high complexity of time and space, large amounts of computing resources are required. Thus, it is almost impossible to be used for long sequences. To solve the above mentioned problems, a new algorithm for local similarity detection based on distributed platform was proposed. The proposed algorithm was a distributed solution for CrossMatch. The problem of insufficient computing resources including time and space requirement was solved. Firstly, the series should be splited and distributed on several nodes. Secondly, the local similarity of every node's own series was dealt with. Finally, the results would be merged and assembled in order to find the local similarity of series. The experimental results show that the accuracy between the proposed algorithm and the CrossMatch algorithm is similar, and the proposed algorithm uses less time. The improved distributed algorithm can not only solve the computation problem of long sequence of time series which can not be processed by a single machine, but also improve the running speed by increasing the number of parallel computing nodes.
Reference | Related Articles | Metrics